Skip to content

fix: harden retool rollout against multi-turn / retry desync#1861

Merged
zhuzilin merged 1 commit into
THUDM:mainfrom
leofan-lab:fix/retool-multi-turn-safety
May 11, 2026
Merged

fix: harden retool rollout against multi-turn / retry desync#1861
zhuzilin merged 1 commit into
THUDM:mainfrom
leofan-lab:fix/retool-multi-turn-safety

Conversation

@leofan-lab
Copy link
Copy Markdown
Contributor

Summary

Four small fixes to examples/retool/generate_with_retool.py that protect one invariant across multi-turn tool-calling rollouts: sample.rollout_log_probs, loss_masks, response_token_ids, and response must stay the same length. Any of these getting out of sync crashes slice_log_prob_with_cp in the trainer with a confusing length-mismatch error.

When this fires

Hit during async retool training runs once samples started cycling through the rollout manager's retry path (aborted → re-enqueued) and once tool outputs got large enough to push past the context cap. Four separate ways the invariant broke in practice; four small fixes.

Fixes

  1. Reset stale sample state at entry. Retried samples arrive with rollout_log_probs / response / loss_mask still populated from the prior attempt; the main loop appends onto the stale list and desyncs from the fresh response_token_ids. Clear all four fields up front.

  2. Clamp per-turn max_new_tokens to the remaining context budget. A turn can otherwise append up to rollout_max_response_len tokens on top of a total already near max_context_length, producing samples larger than the training-side per-partition cap.

  3. Abort when sglang returns text without output_token_logprobs. The old fallback retokenized the text, which grows response_token_ids without a matching log-probs entry. Return ABORTED so the rollout manager re-queues the group cleanly instead of poisoning the trainer.

  4. Trim post-tool-output overflow. Fix [rollout] feat: implement partial rollout feature on rollout engine side #2 clamps the model's generation, but tool output (e.g. a large print() from code_interpreter) is appended unconstrained. Trim response_token_ids / loss_masks / rollout_log_probs together to keep lengths aligned, re-decode response so the text field matches, and mark the sample TRUNCATED.

Risk

  • Blast radius: low — file is under examples/, consumers opt in via --custom-generate-function-path.
  • Behavior change: fix fix typos in example docs #3 is the notable one. Pre-patch, a sample with missing logprobs would silently desync the trainer (eventually crash, but far from the cause). Post-patch, the same condition aborts and re-queues. Operators should treat an uptick in ABORTED samples here as a signal to check sglang version / --return-logprob routing.

Tests

None added. examples/retool/ has no unit-test coverage in the repo today; a full integration test would require mocking sglang + the tool sandbox. Each fix's failure mode is described concretely in the commit message and inline comments, and is reproducible by hand with a 0.5B model + retool yaml + a deliberately noisy tool.

Keeps sample.rollout_log_probs, loss_masks, response_token_ids, and response
length-aligned across turns. Four fixes:

1. Reset stale sample state at entry so a retried (previously aborted)
   sample doesn't concat new tokens onto old log-probs.

2. Clamp per-turn max_new_tokens to the remaining context budget so
   total_length can't blow past max_context_length, producing samples
   larger than the training-side per-partition cap.

3. Abort when sglang returns text without output_token_logprobs. The old
   fallback retokenized, which grows response_token_ids without matching
   log-probs — abort lets the rollout manager re-queue cleanly.

4. Trim post-tool-output overflow. Fix THUDM#2 clamps the model's generation,
   but tool output is appended unconstrained. Trim tokens / loss_masks /
   log-probs together and mark TRUNCATED.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zhuzilin zhuzilin merged commit 41dc3b6 into THUDM:main May 11, 2026
SamitHuang added a commit to SamitHuang/slime that referenced this pull request May 17, 2026
* temp save rfc

Signed-off-by: SamitHuang <285365963@qq.com>

* add plan

Signed-off-by: SamitHuang <285365963@qq.com>

* update

Signed-off-by: SamitHuang <285365963@qq.com>

* [docker] remove true on policy patches (THUDM#1661)

Co-authored-by: Copilot <copilot@github.com>

* [fix]: Qwen3.5-35B-A3B 8-GPU: set TP size to 2 for num_query_groups=2 (THUDM#1662)

* Remove FSDP support (THUDM#1664)

Co-authored-by: Copilot <copilot@github.com>

* docs: add OpenClaw-RL to projects built upon slime (THUDM#1635)

* qwen2.5 0.5b non-colocate (first attempt ok, but nccl error later)

Signed-off-by: samithuang <285365963@qq.com>

* add convert script

* add setup doc

* Support setting update weights in sglang_config (THUDM#1665)

Co-authored-by: Copilot <copilot@github.com>

* fix nccl error by NcclBridge subprocess

* eliminate gpu to cpu weight transfer

Signed-off-by: samithuang <285365963@qq.com>

* Revise weight synchronization strategy in goal plan

Reorder weight synchronization support for colocate and non-colocate scenarios in the goal plan.

* [fix] Fix numerical accuracy issue in dynamic sampling filter (THUDM#1674)

* sync from internal (THUDM#1677)

Co-authored-by: Copilot <copilot@github.com>

* bugfixes from community (THUDM#1678)

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: yueming-yuan <yym022502@gmail.com>
Co-authored-by: coding-famer <chenhegu0109@gmail.com>

* Fix: pass return_tensors in text_kwargs for transformers>=5.0.0 compatibility (THUDM#1648)

* Fix missing packed_seq_params in bshd qkv_format (THUDM#1649)

* [Multimodal][Model] Qwen3.5 VL training example/support (THUDM#1676)

* update docs (THUDM#1680)

Co-authored-by: Copilot <copilot@github.com>

* update docs (THUDM#1681)

Co-authored-by: Copilot <copilot@github.com>

* support offloading non-updatable server (THUDM#1668)

Co-authored-by: Copilot <copilot@github.com>

* bugfix (THUDM#1685)

Co-authored-by: Copilot <copilot@github.com>

* fix: handle Qwen3.5 in quantize_params_fp8 (THUDM#1683)

* bugfix (THUDM#1687)

Co-authored-by: Copilot <copilot@github.com>

* Fix Qwen3.5 & Qwen3-Next linear attention cu_seqlens missing (THUDM#1686)

Co-authored-by: benyi <huangliangmeng.hlm@alibaba-inc.com>

* fix: use semantic version comparison for PyTorch >= 2.6 detection (THUDM#1667)

* [Fix] Minor fix for properly finishing / flushing wandb logging metrics at exit (THUDM#1592)

Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com>

* Autofix/issue 1578 hf2megatron arg suffix (THUDM#1636)

* bugfix (THUDM#1688)

Co-authored-by: Copilot <copilot@github.com>

* fix(examples): update strands_sglang example to v0.3.x API (THUDM#1684)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* [docker] cherry pick qwen3.5 bugfix (THUDM#1691)

Co-authored-by: Copilot <copilot@github.com>

* bugfix/fix Qwen3.5 dense model precision bug in TP_SIZE>1 from sglang (THUDM#1705)

* Fix/qwen3 5 mtp bridge (THUDM#1702)

Co-authored-by: benyi <huangliangmeng.hlm@alibaba-inc.com>

* support epd for glm4.6v (THUDM#1704)

* [docker] support epd for glm4.6v (THUDM#1707)

Co-authored-by: Copilot <copilot@github.com>

* remove script

* [docker] store v0.5.9 patch (THUDM#1710)

Co-authored-by: Copilot <copilot@github.com>

* Add GLM-4.7-Flash MTP training support (THUDM#1712)

* [release] bump to v0.2.3 (THUDM#1682)

Co-authored-by: Copilot <copilot@github.com>

* feat: add GLM-4.6V MoE VL bridge with CP support (THUDM#1715)

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix: resolve rope_theta from rope_parameters dict in HF config validation (THUDM#1720)

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* [docker] patches for glm4.6v, kimi k2.5 and dsa cp only (THUDM#1722)

Co-authored-by: Copilot <copilot@github.com>

* [docker] support IndexCache

* Fix CUDA IPC cache leaks during weight updates (THUDM#1731)

Co-authored-by: Copilot <copilot@github.com>

* [docker] update megatron (THUDM#1729)

Co-authored-by: Copilot <copilot@github.com>

* [docker] Fix IndexCache with mla model (THUDM#1736)

Co-authored-by: Copilot <copilot@github.com>

* [slime-router] support pd disaggregation and remove radix tree middleware (THUDM#1735)

* Fix glm4v megatron bridge (THUDM#1738)

Co-authored-by: Copilot <copilot@github.com>

* [docker] update sglang patch (THUDM#1743)

Co-authored-by: Copilot <copilot@github.com>

* feat: GLM4V multimodal support improvements (THUDM#1745)

Co-authored-by: Copilot <copilot@github.com>

* feat: placeholder worker type, metrics router, and GPQA letter range (THUDM#1746)

Co-authored-by: Copilot <copilot@github.com>

* always enable_metrics and remove dp context (THUDM#1747)

Co-authored-by: Copilot <copilot@github.com>

* fix: resolve SP/CP gradient inflation in FLA (linear attention) layers (THUDM#1748)

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update MTP example configs, rename GLM-4.5 to GLM-4.7, clean scripts (THUDM#1749)

Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Support qwen3.5 loss mask for multi-turn SFT (THUDM#1742)

Co-authored-by: benyi <huangliangmeng.hlm@alibaba-inc.com>

* fix: propagate moe_token_dispatcher_type in bridge model provider (THUDM#1737)

* fix: resolve rope_theta from rope_parameters in DeepseekV32Bridge (THUDM#1734)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* chore: translate remaining Chinese comments to English (THUDM#1726)

* feat: add Qwen3.5-4B model support (THUDM#1721)

* fix: http_utils. disable system proxy for internal SGLang httpx clients (THUDM#1714)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: auto-detect GPUs in qwen3-4b script (THUDM#1700)

* fix: quote `$MOE_LAYER_FREQ` (THUDM#1689)

* disable router health_check and allow prompt_data is None (THUDM#1751)

Co-authored-by: Copilot <copilot@github.com>

* Router for vllm (#5)

* Draft router design

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Add vllm router

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Add router to script

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix gpu memory utilization

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix output token ids

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Add more nccl flag

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix bug

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

---------

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* small fix on qwen3-235b-a22b launch script (THUDM#1719)

* sync internal bugfix (THUDM#1765)

Co-authored-by: Copilot <copilot@github.com>

* Fix uploading sglang metrics to wandb (THUDM#1768)

Co-authored-by: Copilot <copilot@github.com>

* use zhuzilin/sgl-router for sglang-router (THUDM#1770)

Co-authored-by: Copilot <copilot@github.com>

* [docker] update sgl-router (THUDM#1772)

Co-authored-by: Copilot <copilot@github.com>

* [Multimodal] Add Multimodal OPD support (THUDM#1760)

* refactor: remove slime router (THUDM#1773)

Co-authored-by: Copilot <copilot@github.com>

* Add rollout trace timeline viewer (THUDM#1776)

Co-authored-by: Hanyu Zhang <hanyu.zhang@aminer.cn>

* [Fix] Fix duplicate Megatron LR scheduler resume when optimizer state is not loaded (THUDM#1775)

* Support FP8 conversion for Qwen3.5 (THUDM#1769)

* fix typo (THUDM#1759)

Co-authored-by: shiqirui <shiqirui@kupasai.com>

* [Fix]Fix some bugs/clean up (THUDM#1756)

* (fix):not have encoder_only attr cause run failed (THUDM#1741)

Co-authored-by: wangch <wangch@wangchdeMacBook-Air.local>

* update docs

* remove redundant envvar

* some minor cleanup

* [release] bump to v0.2.4  (THUDM#1777)

Co-authored-by: Copilot <copilot@github.com>

* Plan refactor vllm/sglang

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Code implemented

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix bug

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix bug

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix bug

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix port

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix config

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix bug MOE weight sync

* Fix bug vllm transfer weight

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix weight sync

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix config

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Change name config

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* pass critic role through to create RayTrainGroup (THUDM#1797)

* fix qwen3.5 397B converting error when enable expert parallel (THUDM#1799)

Co-authored-by: 周鹤云 <zhouheyun@xiaohongshu.com>

* fix(geo3k-vlm-sft): remove --apply-chat-template from SFT launch script (THUDM#1791)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add host memory metrics to available_memory function (THUDM#1764)

* [WIP] fix loss oom (THUDM#1788)

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* sync from internal (THUDM#1805)

* sync from internal (THUDM#1807)

* feat: add npu patch for qwen3-vl-8b grpo & ppo (THUDM#1750)

Signed-off-by: cjy0x <isjunyi.chen@gmail.com>
Co-authored-by: shiyuan680 <917935075@qq.com>
Co-authored-by: PengchengShi00 <spc117369@gmail.com>

* fix missing position_ids in log-prob forward step (THUDM#1809)

* feat: add support for including missing weights from origin HF checkp… (THUDM#1812)

* [Fix] Initialize grad_norm before found_inf skip path (THUDM#1762)

* [conda] Add install custom sgl-router to build_conda.sh (THUDM#1813)

* Revert no_grad for entropy to prevent comm stuck in dsa (THUDM#1822)

* Add fallback for get_seqlen_balanced_partitions (THUDM#1823)

* Resolve review

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Try colocated vllm weight

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* docs: add Relax to notable projects in README (THUDM#1834)

* Bugfix: use cpu instead of cuda in convert_torch_dist_to_hf.py when --add-missing-from-origin-hf is set (THUDM#1828)

* [fix] eval sample logging when sample is a list (THUDM#1836)

* [Draft] Local runable dev

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* [Fix]  Fix cuda-python pin in build_conda.sh (THUDM#1827)

* fix entropy bug and update code (THUDM#1846)

* Revert "Add fallback for get_seqlen_balanced_partitions" (THUDM#1848)

* fix (THUDM#1849)

* Fix offload train

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Add support for NVIDIA DGX Spark (GB10 / sm_121a, arm64) (THUDM#1835)

* Fix offload train

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix offload_rollout

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix vllm offload

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix offload traing

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix offload weight

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix offload weight

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* refactor/ppo (THUDM#1856)

* [docker] cleanup sglang patch (THUDM#1859)

* [docker] update v0.5.9 patch

* Rename critic config to megatron config (THUDM#1866)

* [Fix] Use Ray ObjectRef await instead of asyncio.to_thread in distributed POST (THUDM#1873)

* chore: include length context in slice_log_prob_with_cp assert (THUDM#1862)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* [docker] upgrade megatron to 1dcf0dafa (THUDM#1867)

* fix ppo value head load bugs (THUDM#1878)

* [docker] upgrade sglang to v0.5.10.post1 (THUDM#1874)

* [docs] update docs

* [docker] update megatron-bridge and add qwen3.6 tests (THUDM#1884)

* fix lint

* Fix(checkpoint): add resume/pause in save_model() for offload_train (fixes THUDM#1886) (THUDM#1888)

* fix ppo value offload bugs (THUDM#1882)

* fix qwen3.6 hf config validation bug (THUDM#1889)

* Add missing metrics to log (THUDM#1890)

* fix(qwen3_next): use torch.get_default_dtype() — get_current_dtype do… (THUDM#1883)

Co-authored-by: yeqinghe <yeqinghe@MacBook-Pro-6.local>

* Fix location error in install script (THUDM#1877)

* Only allow --allgather-cp for DSA model (THUDM#1891)

* Migrate internal feature (THUDM#1897)

* [Fix]  Fix distributed POST actor concurrency split (THUDM#1880)

Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com>

* Fix CI: update rollout_data_postprocess plugin contract for new call site (THUDM#1902)

Co-authored-by: jingshenghang <shenghang.jing@aminer.cn>

* Patch Megatron TP grad coalesce to chunked all-reduce (THUDM#1899)

* fix: harden retool rollout against multi-turn / retry desync (THUDM#1861)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix log file

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix import engine group

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

* Fix rebase code

Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>

---------

Signed-off-by: SamitHuang <285365963@qq.com>
Signed-off-by: samithuang <285365963@qq.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: cjy0x <isjunyi.chen@gmail.com>
Co-authored-by: SamitHuang <285365963@qq.com>
Co-authored-by: Zilin Zhu <zhuzilinallen@gmail.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: none0663 <none0663@outlook.com>
Co-authored-by: Yinjie Wang <yinjie@uchicago.edu>
Co-authored-by: Fengqing Jiang <43953876+Django-Jiang@users.noreply.github.com>
Co-authored-by: yueming-yuan <yym022502@gmail.com>
Co-authored-by: coding-famer <chenhegu0109@gmail.com>
Co-authored-by: Lawrence Wu <lawrence.wu@harmonic.fun>
Co-authored-by: huang3eng <huang3eng@gmail.com>
Co-authored-by: benyi <huangliangmeng.hlm@alibaba-inc.com>
Co-authored-by: Aaron Batilo <AaronBatilo@gmail.com>
Co-authored-by: Silun Wang <igeekwang@gmail.com>
Co-authored-by: Chengxing Xie <91449279+yitianlian@users.noreply.github.com>
Co-authored-by: Yuan He <33579950+Lawhy@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Mor Zusman <mor.zusmann@gmail.com>
Co-authored-by: append-only <shw20010329@163.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Nan Jiang <59716405+nanjiangwill@users.noreply.github.com>
Co-authored-by: Xuan Wang <49010704+stevewx@users.noreply.github.com>
Co-authored-by: Hubert Wang <huberthyw@gmail.com>
Co-authored-by: Hou Shihao <shhou007@gmail.com>
Co-authored-by: DongzhuoranZhou <110855293+DongzhuoranZhou@users.noreply.github.com>
Co-authored-by: Ailuntz <130897222+ailuntz@users.noreply.github.com>
Co-authored-by: Zhuohao Li <garrick0508@gmail.com>
Co-authored-by: Hanyu Zhang <hanyu.zhang@aminer.cn>
Co-authored-by: Kang Yu <kangy.me@gmail.com>
Co-authored-by: peterjc123 <peter_jiachen@163.com>
Co-authored-by: qrskannbara <94727257+albaNnaksqr@users.noreply.github.com>
Co-authored-by: shiqirui <shiqirui@kupasai.com>
Co-authored-by: wangyufak <wangch9@xiaopeng.com>
Co-authored-by: wangch <wangch@wangchdeMacBook-Air.local>
Co-authored-by: Xintong Li <znculee@gmail.com>
Co-authored-by: TM <tianmingxu.tmxu@gmail.com>
Co-authored-by: 周鹤云 <zhouheyun@xiaohongshu.com>
Co-authored-by: LiLei <77353389+lilei199908@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: cjy0x <isjunyi.chen@gmail.com>
Co-authored-by: shiyuan680 <917935075@qq.com>
Co-authored-by: PengchengShi00 <spc117369@gmail.com>
Co-authored-by: 杨睿 <595403043@qq.com>
Co-authored-by: Mathew Han <49226490+mathewjhan@users.noreply.github.com>
Co-authored-by: haoxuanJIA <116806014+boots-coder@users.noreply.github.com>
Co-authored-by: ryang <38470282+ryang-max@users.noreply.github.com>
Co-authored-by: Leo Fan <84952531+leofan-lab@users.noreply.github.com>
Co-authored-by: Long Yijun <156500868+Procrastinatorrrr@users.noreply.github.com>
Co-authored-by: HeatherLiuzh <heather996lzh@gmail.com>
Co-authored-by: yeqinghe <yeqinghe@MacBook-Pro-6.local>
Co-authored-by: tao W <122036357+selfanti@users.noreply.github.com>
Co-authored-by: jingshenghang <48083555+jingshenghang@users.noreply.github.com>
Co-authored-by: jingshenghang <shenghang.jing@aminer.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants